No. The model suggests that roughly 5% of the data should fall outside the range of these bars. That looks compatible with the data we have.
Note: These points are not outliers – they fit the pattern that the model anticipates for the data (at least as far as this plot diagnoses) just fine. The model says there should be some data out there.
In problems 2-6, the point was to draw conclusions from a comparison of the pictures presented. 2. Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
```r
library(CalvinBayes)
BernGrid("H", resolution = 4, prior = triangle::dtriangle)
BernGrid("H", resolution = 10, prior = triangle::dtriangle)
BernGrid("H", prior = 1, resolution = 100, geom = geom_col)
BernGrid("H", resolution = 100,
prior = function(p) abs(p - 0.5) > 0.48, geom = geom_col)
```
<img src="Redoing-solutions_files/figure-html/BernGrid-01-1.png" width="80%" /><img src="Redoing-solutions_files/figure-html/BernGrid-01-2.png" width="80%" /><img src="Redoing-solutions_files/figure-html/BernGrid-01-3.png" width="80%" /><img src="Redoing-solutions_files/figure-html/BernGrid-01-4.png" width="80%" />
The posterior can only be non-zero if both of the following are true:
In each example we see that flipping a head drops the posterior to 0 for a coin that only gives tails because it is impossible to flip a head with such a coin. The last example shows that if the prior is 0, the posterior will also be 0. If our prior says something is impossible, no amount of data will change our minds. (So generally, we are cautious about having the prior be 0 unless something is truly impossible, not just unlikely or unexpected.)
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
BernGrid("TTHT", prior = triangle::dtriangle)
BernGrid("TTHT",
prior = function(x) triangle::dtriangle(x)^0.1)
BernGrid("TTHT",
prior = function(x) triangle::dtriangle(x)^10)
The only thing changing in these examples is the prior, the data are the same each time. The more concentrated the prior, the more it shapes the posterior.
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0,13), rep(1,14)), prior = triangle::dtriangle)
BernGrid(c(rep(0,13), rep(1,14)), resolution = 1000, prior = dfoo)
Comparing the two priors we see that the large concentrations of the foo prior near 0.15 and 0.85 result in a posterior that allocates some credibility there even though the data are suggesting a value near 0.5. Since the triangle distribution does not assign especially large credibility there, those “bumps” near 0.15 and 0.85 are not present in the posterior when we use a triangle prior.
Also note that the posterior bumps for the foo prior are shifted toward the center (because the data are near 50% heads).
Run the following examples in R. Compare the plots produced and comment the big idea(s) illustrated by this comparison.
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0, 3), rep(1, 3)), prior = dfoo)
BernGrid(c(rep(0, 10), rep(1, 10)), prior = dfoo)
BernGrid(c(rep(0, 30), rep(1, 30)), prior = dfoo)
BernGrid(c(rep(0, 100), rep(1, 100)), prior = dfoo)
With little data, the prior has great sway over the posterior. With more and more data, the influence of the prior becomes less and less.
Run the following examples in R and compare them to the ones in the previous exercise. What do you observe?
library(CalvinBayes)
dfoo <- function(p) {
0.02 * dunif(p) +
0.49 * triangle::dtriangle(p, 0.1, 0.2) +
0.49 * triangle::dtriangle(p, 0.8, 0.9)
}
BernGrid(c(rep(0, 3), rep(1, 4)), prior = dfoo)
BernGrid(c(rep(0, 4), rep(1, 3)), prior = dfoo)
BernGrid(c(rep(0, 10), rep(1, 11)), prior = dfoo)
BernGrid(c(rep(0, 11), rep(1, 10)), prior = dfoo)
BernGrid(c(rep(0, 30), rep(1, 31)), prior = dfoo)
BernGrid(c(rep(0, 31), rep(1, 30)), prior = dfoo)
A very informative/opinionated prior can turn small differences in a small data set into large differences in the posterior because it pushes the posterior toward one of the regions where the prior is large. But with more and more data, this effect becomes less and less pronounced.
Create a function in R that converts Fahrenheit temperatures to Celsius temperatures. [Hint: \(C = (F-32) \cdot 5/9\).]
What you turn in should show
c(-40, 32, 98.6, 212) as the input.to_celsius <- function(temp) {
(temp - 32) * 5 / 9
}
to_celsius(c(-40, 32, 98.6, 212))
## [1] -40 0 37 100
function() to create a function in R that is equivalent to p(x).gf_function() to plot the function on the interval \([0, 1]\).integrate()).What is the largest value of \(p(x)\)? At what value of \(x\) does it occur? Is it a problem that this value is larger than 1?
Hint: differentiation might be useful.
f <- function(x) {
(x >= 0) * (x <= 1) * 6 * x * (1 - x)
}
gf_function(f, xlim = c(0, 1))
integrate(f, 0, 1)
## 1 with absolute error < 1.1e-14
\(\frac{df}{dx} = 6 - 12x = 0\) when \(x = 0.5\) and the second derivative is negative, so the maximum value occurs when \(x = 0.5\).
\(B\) is continuous with kernel \(f(x) = x^2(1-x)\) on \([0, 1]\).
Hint: first figure out what the pdf is.
x <- 1:4
p <- function(x) x/10
# Expected value is sum of values times probabilities
sum(x * p(x))
## [1] 3
k <- function(x) x^2 * (1 - x) * (x >= 0) * (x <= 1)
integrate(k, 0, 1)
## 0.08333333 with absolute error < 9.3e-16
f <- function(x) k(x) / integrate(k, 0 , 1)$value
integrate(f, 0, 1)
## 1 with absolute error < 1.1e-14
gf_function(f, xlim = c(0, 1))
# expected value
integrate(function(x) x * f(x), 0, 1)
## 0.6 with absolute error < 6.7e-15
In Bayesian inference, we will often need to come up with a distribution that matches certain features that correspond to our knowledge or intuition about a situation. Find a normal distribution with a mean of 10 such that half of the distribution is within 3 of 10 (ie, between 7 and 13).
Hint: use qnorm() to determine how many standard deviations are between 10 and 7.
library(mosaic)
z <- qnorm(0.75); z
## [1] 0.6744898
sigma <- (13 - 10) / z; sigma
## [1] 4.447807
xpnorm(c(7, 13), mean = 10, sd = sigma)
##
## If X ~ N(10, 4.448), then
## P(X <= 7) = P(Z <= -0.6745) = 0.25 P(X <= 13) = P(Z <= 0.6745) = 0.75
## P(X > 7) = P(Z > -0.6745) = 0.75 P(X > 13) = P(Z > 0.6745) = 0.25
##
## [1] 0.25 0.75
School children were surveyed regarding their favorite foods. Of the total sample, 20% were 1st graders, 20% were 6th graders, and 60% were 11th graders. For each grade, the following table shows the proportion of respondents that chose each of three foods as their favorite.
From that information, construct a table of joint probabilities of grade and favorite food.
Are grade and favorite food independent? Explain how you ascertained the answer.
| grade | Ice cream | Fruit | French fries |
|---|---|---|---|
| 1st | 0.3 | 0.6 | 0.1 |
| 6th | 0.6 | 0.3 | 0.1 |
| 11th | 0.3 | 0.1 | 0.6 |
conditional <-
rbind(c(0.3, 0.6, 0.1), c(0.6, 0.3, 0.1), c(0.3, 0.1, 0.6))
conditional
## [,1] [,2] [,3]
## [1,] 0.3 0.6 0.1
## [2,] 0.6 0.3 0.1
## [3,] 0.3 0.1 0.6
marginal <-
cbind(c(0.2, 0.2, 0.6), c(0.2, 0.2, 0.6), c(0.2, 0.2, 0.6))
marginal
## [,1] [,2] [,3]
## [1,] 0.2 0.2 0.2
## [2,] 0.2 0.2 0.2
## [3,] 0.6 0.6 0.6
joint <- conditional * marginal; joint
## [,1] [,2] [,3]
## [1,] 0.06 0.12 0.02
## [2,] 0.12 0.06 0.02
## [3,] 0.18 0.06 0.36
sum(joint)
## [1] 1
Grade and favorite food are not independent.
conditional == marginal
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE TRUE
all(conditional == marginal)
## [1] FALSE
Three cards are placed in a hat. One is black on both sides, one is white on both sides, and the third is white on one side and black on the other. One card is selected at random from the three cards in the hat and placed on the table.
The top of the card is black.
There are other ways to do this, but here is a way that looks like a Bayesian update of a prior using a likelihood.
expand.grid(
card = c("BB", "BW", "WW")
) %>%
mutate(
prior = 1/3, # all equally likely to be drawn
likelihood = c(1, 0.5, 0), # chance of see black on top
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
## card prior likelihood posterior
## 1 BB 0.3333333 1.0 0.6666667
## 2 BW 0.3333333 0.5 0.3333333
## 3 WW 0.3333333 0.0 0.0000000
The three cards from the previous problem are returned to the hat. Once again a card is pulled and placed on the table, and once again the top is black. This time a second card is drawn and placed on the table. The top side of the second card is white.
Grid <-
expand.grid(
card1 = 0:2, # number of black sides on card
card2 = 0:2 # number of black sides on card
) %>%
mutate(
prior0 = ifelse( card1 == card2, 0, 1),
prior = prior0 / sum(prior0),
likelihood = card1 / 2 * (2 - card2) / 2,
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Grid %>%
DT::datatable() %>%
DT::formatRound(
columns = ~ prior + likelihood + posterior,
digits = 3)
# answers
# a: card 1 has two black sides
Grid %>% filter(card1 == 2) %>% pull(posterior) %>% sum()
## [1] 0.75
# b: second card has 1 black side
Grid %>% filter(card2 == 1) %>% pull(posterior) %>% sum()
## [1] 0.25
# c: first card has 2 black sides, second has 1 black side
Grid %>% filter(card1 == 2, card2 == 1) %>%
pull(posterior) %>% sum()
## [1] 0.25
Suppose there are two species of panda, A and B. Without a special blood test, it is not possible to tell them apart. But it is known that half of pandas are of each species and that 10% of births from species A are twins and 20% of births from species B are twins.
If a female panda has twins, what is the probability that she is from species A?
If the same panda later has another set of twins, what is the probability that she is from species A?
A different panda has twins and a year later gives birth to a single panda. What is the probability that this panda is from species A?
Let \(A\) and \(B\) be the the events that a panda is species A or B. Let \(T_i\) be the event that the \(i\)th birth is twins.
\[\begin{align*} P(A \mid T_1) &= \frac{P(A \cap T_1)}{P(T_1)} \\ &= \frac{P(A) \cdot P(T_1 \mid A)}{P(A \cap T_1) + P(B \cap T_1)} \\ &= \frac{P(A) \cdot P(T_1 \mid A)}{P(A) \cdot P(T_1 \mid A) + P(B) \cdot P(T_1 \mid B)} \\ &= \frac{0.5 \cdot 0.1}{0.5 \cdot 0.1 + 0.5 \cdot 0.2} = \frac{2}{3} \end{align*}\]
\[\begin{align*} P(A \mid T_1 \cap T_2) &= \frac{P(A \cap T_1 \cap T_2)}{P(T_1 \cap T_2)} \\ &= \frac{P(A) \cdot P(T_1 \cap T_2 \mid A)} {P(A \cap T_1 \cap T_2) + P(B \cap T_1 \cap T_2)} \\ &= \frac{P(A) P(T_1 \cap T_2 \mid A)} {P(A) \cdot P(T_1 \cap T_2 \mid A) + P(B) \cdot P(T_1 \cap T_2 \mid B)} \\ &= \frac{0.5 \cdot 0.1 \cdot 0.1}{0.5 \cdot 0.1 \cdot 0.1 + 0.5 \cdot 0.2 \cdot 0.2} = 0.2 \end{align*}\]
\[\begin{align} P(A \mid T_1 \cap T_2) &= \frac{P(A \cap T_1 \cap T_2)}{P(T_1 \cap T_2)} = \frac{P(A \cap T_1 \cap T_2)}{P(A \cap T_1 \cap T_2) + P(B \cap T_1 \cap T_2)} \\ &= \frac{P(T_1) \cdot P(A \mid T_1) \cdot P(T_2 \mid A \cap T_1)} {P(T_1) \left[ P(A \mid T_1) \cdot P(T_2 \mid A \cap T_1) + P(B \mid T_1) \cdot P(T_2 \mid B \cap T_1) \right]} \\ &= \frac{P(A \mid T_1) \cdot P(T_2 \mid A)} {P(A \mid T_1) \cdot P(T_2 \mid A) + P(B \mid T_1) \cdot P(T_2 \mid B)} \\ &= \frac{\frac13 \cdot 0.1}{\frac13 \cdot 0.1 + \frac23 \cdot 0.2} = 0.2 \end{align}\] Notice that this says that we can use the posterior after observing the first twins as a prior for the observation of the second twins (ignoring the first twins at this point because the information from that event is now included in the new prior). This works because \(P(T_2 \mid A \cap T_1) = P(T_2 \mid A)\), which follows from the assumption that \(P(T_1 \cap T_2 \mid A) = P(T_1 \mid A) \cdot P(T_2 \mid A)\) (ie, that for each species twins in the first birth are independent of twins in the second):
\[\begin{align*} P(T_2 \mid A \cap T_1) &= \frac{P(T_1 \cap A \cap T_2)} {P(A \cap T_1)} \\ &= \frac{P(A) \cdot P(T_1 \cap T_2 \mid A)} {P(A) \cdot P(T_1 \mid A)} \\ &= \frac{P(A) \cdot P(T_1 \mid A) \cdot P(T_2 \mid A)} {P(A) \cdot P(T_1 \mid A)} \\ &= P(T_2 \mid A) \end{align*}\] A similar statement holds with \(B\) in place of \(A\) by the same argument.
Let \(T_2\) be the event that the second birth is NOT twins, and the same algebra above leads to the following arithmetic:
\[\begin{align*} \frac{0.5 \cdot 0.1 \cdot 0.9}{0.5 \cdot 0.1 \cdot 0.9 + 0.5 \cdot 0.2 \cdot 0.8} &= 0.36 \\ \frac{\frac13 \cdot 0.9}{\frac13 \cdot 0.9 + \frac23 \cdot 0.8} &= 0.36 \end{align*}\]
Panda_Grid1 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1, 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid1
## species prior likelihood posterior
## 1 A 0.5 0.1 0.3333333
## 2 B 0.5 0.2 0.6666667
Panda_Grid2 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1 * 0.1, 0.2 * 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2
## species prior likelihood posterior
## 1 A 0.5 0.01 0.2
## 2 B 0.5 0.04 0.8
Panda_Grid2a <-
Panda_Grid1 %>%
mutate(
prior = posterior,
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2a
## species prior likelihood posterior
## 1 A 0.3333333 0.1 0.2
## 2 B 0.6666667 0.2 0.8
Panda_Grid3 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.5, 0.5),
likelihood = c(0.1 * 0.9, 0.2 * 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid3
## species prior likelihood posterior
## 1 A 0.5 0.09 0.36
## 2 B 0.5 0.16 0.64
Panda_Grid3a <-
Panda_Grid1 %>%
mutate(
prior = posterior,
likelihood = c(0.9, 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid3a
## species prior likelihood posterior
## 1 A 0.3333333 0.9 0.36
## 2 B 0.6666667 0.8 0.64
More Pandas. You get more interested in pandas and learn that at your favorite zoo, 70% of pandas are species A and 30% are species B. You learn that one of the pandas has twins.
What is the probability that the panda is species A?
The same panda has a single panda the next year. Now what is the probability that the species is A?
Panda_Grid1 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.1, 0.2),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid1
## species prior likelihood posterior
## 1 A 0.7 0.1 0.5384615
## 2 B 0.3 0.2 0.4615385
Panda_Grid2 <-
expand.grid(
species = c("A", "B")
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.1 * 0.9, 0.2 * 0.8),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
Panda_Grid2
## species prior likelihood posterior
## 1 A 0.7 0.09 0.5675676
## 2 B 0.3 0.16 0.4324324
Always show your work. If I can’t tell how you got your answer, you probably won’t get full credit.
For R code, follow the style guide. It is not optional.
Discussion should not be in R comments. Use text for that.
Take advantage of the ability to typeset mathematical notation in R Markdown.
Leave some margins (on all sides). And leave some space between problems.
Your goal should be that I can read your work easily. The more time I have to spend just figuring out what you are doing/saying, the less your work is worth. Communication is an important skill. Here’s a chance to practice.
Some comments on probability based on recent homework.
Use good notation.
If you create notation for events, random variables, etc., explain it. A good template is “Let _____ be ________”. The first blank is your notation and the second is what it means.
\(P(A \cap B)\), \(P(A \mid B)\), and \(P(B \mid A)\) mean differeng things. Be sure to use the correct one.
Use of the equally likely rule requires justification – how do you know the outcomes are equally likely?
Don’t make up probability rules. If we don’t have the rule you want to use, take a moment to show that the rule is valid. (If you can’t, perhaps it isn’t really a rule after all.)
Suppose we have a test with a 97% specificity and a 99% sensitivity just like in Section @ref(discrete-params). Now suppose that a random person is selected, has a first test that is positive, then is retested and has a second test that is negative.
Taking into account both tests, and assuming the results of the two tests are independent, what is the probability that the person has the disease?
Hint: We can use the the posterior after the first test as a prior for the second test. Be sure to keep as many decimal digits as possible (use R and don’t round intermediate results).
Note: In this problem we are assuming the the results of the two tests are independent, which might not be the case for some medical tests.
expand.grid(
status = c("healthy", "diseased")
) %>%
mutate(
prior1 = c(999/1000, 1/1000),
like1 = c(0.03, 0.99),
post1 = prior1 * like1,
post1 = post1/ sum(post1),
prior2 = post1,
like2 = c(0.97, 0.01),
post2 = prior2 * like2,
post2 = post2/ sum(post2)
)
## status prior1 like1 post1 prior2 like2 post2
## 1 healthy 0.999 0.03 0.96802326 0.96802326 0.97 0.9996595692
## 2 diseased 0.001 0.99 0.03197674 0.03197674 0.01 0.0003404308
Consider again the disease and diagnostic test of the previous exercise and Section @ref(discrete-params).
Suppose that a person selected at random from the population gets the test and it comes back negative. Compute the probability that the person has the disease.
How does the result compare with your answer in the previous exercise?
expand.grid(
status = c("healthy", "diseased")
) %>%
mutate(
prior1 = c(999/1000, 1/1000),
like1 = c(0.97, 0.01),
post1 = prior1 * like1,
post1 = post1/ sum(post1),
prior2 = post1,
like2 = c(0.03, 0.99),
post2 = prior2 * like2,
post2 = post2/ sum(post2)
)
## status prior1 like1 post1 prior2 like2 post2
## 1 healthy 0.999 0.97 9.999897e-01 9.999897e-01 0.03 0.9996595692
## 2 diseased 0.001 0.01 1.031949e-05 1.031949e-05 0.99 0.0003404308
MyBernGrid() so that it takes an argument specifying the probability for the HDI. Use it to create a plot showing 50% HDI for theta using a symmetric triangle prior and data consisting of 3 success and 5 failures.MyBernGrid2 <- function(
x, n, # x successes in n tries
prior = dunif, # function for prior
resolution = 1000, # number of intervals to use for grid
prob = 0.95, # probability for HDI
...) { # additional arguments for prior
Grid <-
expand.grid(
theta = seq(0, 1, length.out = resolution + 1)
) %>%
mutate( # saving only the normalized version of each
prior = prior(theta, ...),
prior = prior / sum(prior) * resolution,
likelihood = dbinom(x, n, theta),
likelihood = likelihood / sum(likelihood) * resolution,
posterior = prior * likelihood,
posterior = posterior / sum(posterior) * resolution
)
H <- hdi_from_grid(Grid, pars = "theta", prob = prob)
gf_line(prior ~ theta, data = Grid, color = ~"prior",
size = 1.15, alpha = 0.8) %>%
gf_line(likelihood ~ theta, data = Grid, color = ~"likelihood",
size = 1.15, alpha = 0.7) %>%
gf_line(posterior ~ theta, data = Grid, color = ~"posterior",
size = 1.15, alpha = 0.6) %>%
gf_pointrangeh(
height ~ mode + lo + hi, data = H,
color = "red", size = 1) %>%
gf_labs(title = "Prior/Likelihood/Posterior",
subtitle = paste("Data: n =", n, ", x =", x),
captions = paste0(100 * prob, "% HDI shown in red")) %>%
gf_refine(
scale_color_manual(
values = c(
"prior" = "forestgreen",
"likelihood" = "blue",
"posterior" = "red"),
breaks = c("prior", "likelihood", "posterior")
)) %>%
print()
invisible(Grid) # return the Grid, but don't show it
}
library(triangle)
MyBernGrid2(10, 20, prior = dtriangle, prob = 0.5, a = 0, b = 1, c = .80)
Let’s try the grid method for a model with two parameters. Suppose we want to estimate the mean and standard deviation of the heights of 21-year-old American men or women (your choice which group). First, lets get some data.
library(NHANES)
Men <- NHANES %>% filter(Gender == "male", Age == 21)
Women <- NHANES %>% filter(Gender == "female", Age == 21)
Likelihood Our model is that heights are normally distributed with mean \(\mu\) and standard deviation \(\sigma\):
Prior. For our prior, let’s use something informed just a little bit by what we know about people (we could do better with these priors, but that’s not our goal at the moment):
So our model is \[\begin{align*} \mathrm{Height} & \sim {\sf Norm}(\mu, \sigma) \\ \mu & \sim {\sf Unif}(150, 200) \\ \sigma & \sim {\sf Unif}(0, 20) \end{align*}\]
Grid. Use a grid that has 200-500 values for each parameter. Fill in the ?? below to create and update your grid.
Notes:
library(purrr)
Height_Grid <-
expand.grid(
mu = seq(??, ??, ?? = ??),
sigma = seq(??, ??, ?? = ??)
) %>%
filter(sigma != 0) %>% # remove sigma = 0
mutate(
prior = ??,
logprior = ??,
loglik =
map2_dbl(mu, sigma,
~ ?? ) # use .x for mu and .y for sigma
logpost = logprior + loglik,
posterior = exp(logpost)
)
Once you have created and updated your grid, you can visualize your posterior using a command like this (use gf_lims() if you want to zoom in a bit):
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow")
Now answer the following questions.
Using the picture you just created, what would you say are credible values for \(\mu\) and for \(\sigma\)?
Now use hdi_from_grid() to compute a 90% highest density (posterior) intervals for each parameter. Do those make sense when you compare to the picture?
Create a plot showing the posterior distribution for \(\mu\) and a 90% HDI for \(\mu\). [Hint: how can you get a grid for the marginal distribution of \(\mu\) from the grid for \(\mu\) and \(\sigma\)?]
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:mosaic':
##
## cross
Height_Grid <-
expand.grid(
mu = seq(150, 200, by = 0.1),
sigma = seq(0, 20, by = 0.1)
) %>%
filter(sigma > 0) %>% # can't have sigma be 0
mutate(
prior = 1,
logprior = 0,
loglik =
map2_dbl(mu, sigma,
~ sum(dnorm(Men$Height, mean = .x, sd = .y, log = TRUE))),
logpost = logprior + loglik,
posterior = exp(logpost),
posterior1 = posterior / sum(posterior, na.rm = TRUE) / (0.1 * 0.1)
)
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow") %>%
gf_lims(x = c(172, 180), y = c(7, 9))
## Warning: Removed 98499 rows containing non-finite values (stat_contour).
## Warning: Removed 98499 rows containing missing values (geom_tile).
hdi_from_grid(Height_Grid, pars = c("mu", "sigma"), prob = 0.9)
## # A tibble: 2 x 7
## param lo hi prob height mode_height mode
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu 174. 177. 0.906 0.109 0.450 176.
## 2 sigma 6.9 9 0.914 0.151 0.630 7.8
Mu_Grid <-
Height_Grid %>%
group_by(mu) %>%
summarise(posterior = sum(posterior1, na.rm = TRUE)) %>%
mutate(posterior1 = posterior / sum(posterior) / 0.1)
gf_line(posterior1 ~ mu, data = Mu_Grid) %>%
gf_pointrangeh(height ~ mode + lo + hi,
data = hdi_from_grid(Height_Grid, pars = "mu", prob = 0.9),
color = "red")
library(purrr)
Height_Grid <-
expand.grid(
mu = seq(150, 200, by = 0.1),
sigma = seq(0, 20, by = 0.1)
) %>%
filter(sigma > 0) %>% # can't have sigma be 0
mutate(
prior = dtriangle(mu, 150, 200, 180) * dtriangle(sigma, 0, 20, 5),
logprior = 0,
loglik =
map2_dbl(mu, sigma,
~ sum(dnorm(Men$Height, mean = .x, sd = .y, log = TRUE))),
logpost = logprior + loglik,
posterior = exp(logpost),
posterior1 = posterior / sum(posterior, na.rm = TRUE) / (0.1 * 0.1)
)
hdi_from_grid(Height_Grid, pars = c("mu", "sigma"), prob = 0.9)
## # A tibble: 2 x 7
## param lo hi prob height mode_height mode
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mu 174. 177. 0.906 0.109 0.450 176.
## 2 sigma 6.9 9 0.914 0.151 0.630 7.8
gf_tile(posterior ~ mu + sigma, data = Height_Grid) %>%
gf_contour(posterior ~ mu+ sigma, data = Height_Grid, color = "yellow") %>%
gf_lims(x = c(172, 180), y = c(7, 9))
## Warning: Removed 98499 rows containing non-finite values (stat_contour).
## Warning: Removed 98499 rows containing missing values (geom_tile).
Mu_Grid <-
Height_Grid %>%
group_by(mu) %>%
summarise(posterior = sum(posterior1, na.rm = TRUE)) %>%
mutate(posterior1 = posterior / sum(posterior) / 0.1)
gf_line(posterior1 ~ mu, data = Mu_Grid) %>%
gf_pointrangeh(height ~ mode + lo + hi,
data = hdi_from_grid(Height_Grid, pars = "mu", prob = 0.9),
color = "red")
Bob plays basketball.
He shoots 70% of his shots from 2-point range and makes 48% of these shots. He shoots 30% of his shots from 3-point range and makes 32% of these shots. Joe just made a shot. What is the probability that it was a 3-point shot?
Do this problem twice. The first time, use probability rules, carefully denoting the probabilities involved. (For any number that appears, I should be able to tell from your notation where it came from.) The second time, use the “grid method”.
Here’s the grid method.
expand.grid(
shot = c(2, 3)
) %>%
mutate(
prior = c(0.7, 0.3),
likelihood = c(0.48, 0.32),
posterior = prior * likelihood,
posterior = posterior / sum(posterior)
)
## shot prior likelihood posterior
## 1 2 0.7 0.48 0.7777778
## 2 3 0.3 0.32 0.2222222
Alice has 3 hats labeled with the letters H, A, and T. In each hat are marbles of various colors.
| Hat | White marbles | Red marbles | Yellow marbles |
|---|---|---|---|
| H | 4 | 10 | 6 |
| A | 6 | 12 | 2 |
| T | 5 | 3 | 2 |
Alice randomly selects a hat by flipping two coins. If both are heads, she chooses hat H. If both are tails, she chooses hat T. If there is one head and one tail, she chooses hat A. Once that hat is selected, she draws out two marbles.
If the two marbles are both white, what is the probability that the hat was hat A?
If there is one red marble and one yellow marble, what is the probability that the hat was hat A?
If the two marbles are the same color, what is the probability that the hat was hat A?
expand.grid(
hat = c("H", "A", "T")
) %>%
mutate(
prior = c(1/4, 1/2, 1/4),
l1 = c(4/20 * 3/19, 6/20 * 5/19, 5/10 * 4/9),
l2 = c(2 * 10/20 * 6/19, 2 * 12/20 * 2/19, 2 * 3/10 * 2/9),
l3 = c(4/20 * 3/19 + 10/20 * 9/19 + 6/20 * 5/19,
6/20 * 5/19 + 12/20 * 11/19 + 2/20 * 1/19,
5/10 * 4/9 + 3/10 * 2/9 + 2/10 * 1/9),
post1 = prior * l1 / sum(prior * l1),
post2 = prior * l2 / sum(prior * l2),
post3 = prior * l3 / sum(prior * l3)
) %>%
filter(hat == "A")
## hat prior l1 l2 l3 post1 post2 post3
## 1 A 0.5 0.07894737 0.1263158 0.4315789 0.3835227 0.36 0.567256
Suppose we have a coin that we know comes from a magic-trick store, and therefore we believe that the coin is strongly biased either usually to come up heads or usually to come up tails, but we don’t know which.
Express this belief as a beta prior. That is, find shape parameters that lead to a beta distribution that corresponds to this belief.
Now we flip the coin 5 times and it comes up heads in 4 of the 5 flips. What is the posterior distribution?
Use quick_bern_beta() or a similar function of your own creation to show the prior and posterior graphically.
We need a beta distribution with both shape parameters less than 1. For example:
# prior probability that heads happens in <= 10% of flips
# about 1/3 <= 10%, 1/3 >= 90%, and 1/3 between 10% and 90%
pbeta(0.10 , shape1 = 0.2, shape2 = 0.2)
## [1] 0.3366898
gf_dist("beta", shape1 = 0.2, shape2 = 0.2)
The posterior in this case would be Beta(4.2, 1.2)
library(CalvinBayes)
quick_bern_beta(x = 4, n = 5, a = 0.2, b = 0.2) %>%
gf_dist("beta", shape1 = 4.2, shape2 = 1.2, color = "red",
linetype = "dashed") %>%
gf_lims(y = c(0, 10))
Show that if \(\alpha, \beta > 1\), then the mode of a Beta(\(\alpha\), \(\beta\)) distribution is \(\displaystyle {\frac{\alpha -1}{\alpha +\beta -2}}\).
Hint: What would you do if you were in Calculus I?
Find the mode by finding where the deriviative of the pdf is 0. You can make things a bit easier if you take the log of the pdf first.
Suppose we have a coin that we know comes from a magic-trick store, and therefore we believe that the coin is strongly biased either usually to come up heads or usually to come up tails, but we don’t know which.
Express this belief as a beta prior. That is, find shape parameters that lead to a beta distribution that corresponds to this belief.
Now we flip the coin 5 times and it comes up heads in 4 of the 5 flips. What is the posterior distribution?
Use quick_bern_beta() or a similar function of your own creation to show the prior and posterior graphically.
The prior should choose both shape parameters to be less than 0.
If we want to the prior to say that it is equally likely that the bias is less than 10%, more than 90% or between 10 and 90%, then {Beta}(.2, .2) is about right. Since the prior gets quite large right near 0 and 1, choosing custom limits for the y axis makes for a better picture.
pbeta(0.10, .2, .2)
## [1] 0.3366898
quick_bern_beta(4, 5, a = 0.2, b= 0.2) %>% gf_lims(y = c(0, 5))
Suppose we estimate a proportion \(\theta\) using a \({\sf Beta}(10, 10)\) prior and a observe 26 successes and 48 failures.
qbeta()]Most of this is straghtforward from the formulas. Part d is tricky to do just right.
a. Beta(36, 58)
b. mean is 36/(36 + 58) = 0.3829787
c. mode is 35/(35 + 57) = 0.3804348
# this is close
almost <- qbeta(c(0.05, 0.95), 36, 58); almost
## [1] 0.3023307 0.4664734
# but heights are not exactly the same
dbeta(almost, 36, 58)
## [1] 2.219110 1.986591
# so we need to shift the interval to the left (lower density at left end and
# raising it on the right end)
# this funciton is 0 when we find the correct percentage on the left side.
left_prob <- function(p) {
dbeta(qbeta(p, 36, 58), 36, 58) -
dbeta(qbeta(.90 + p, 36, 58), 36, 58)
}
uniroot(left_prob, c(0,0.05))
## $root
## [1] 0.04647647
##
## $f.root
## [1] 9.587978e-05
##
## $iter
## [1] 3
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05
left_p <- uniroot(left_prob, c(0,0.05)) %>% value()
HDI <- qbeta(c(left_p, .90 + left_p), 36, 58); HDI
## [1] 0.3006980 0.4647477
pbeta(HDI, 36, 58) %>% diff()
## [1] 0.9
gf_dist("beta", shape1 = 36, shape2 = 58) %>%
gf_segment(
dbeta(HDI[1], 36, 58) + dbeta(HDI[2], 36, 58) ~ HDI[1] + HDI[2],
color = "red") %>%
gf_segment(
dbeta(almost[1], 36, 58) + dbeta(almost[2], 36, 58) ~ almost[1] + almost[2],
color = "blue")
Suppose a state-wide election is approaching, and you are interested in knowing whether the general population prefers the democrat or the republican. There is a just-published poll in the newspaper, which states that of 100 randomly sampled people, 58 preferred the republican and the remainder preferred the democrat.
Suppose that before the newspaper poll, your prior belief was a uniform distribution. What is the 95% HDI on your beliefs after learning of the newspaper poll results?
Based on what you know about elections, why is a uniform prior not a great choice? Repeat part (a) with a prior the conforms better to what you know about elections. How much does the change of prior affect the 95% HDI?
You find another poll conducted by a different news organization In this second poll, 56 of 100 people preferred the republican. Assuming that peoples’ opinions have not changed between polls, what is the 95% HDI on the posterior taking both polls into account. Make it clear which prior you are using.
Based on this data (and your choice of prior, and assuming public opinion doesn’t change between the time of the polls and election day), what is the probability that the republican will win the election.
You have some choices in this problem, so there isn’t one correct answer. Things I especially looked for were:
Your justification for your prior. It should indicate (a) that you understand what the prior is and (b) that you have chosen something that relfects how elections turn out.
The posterior probability of a republican winning is the probability in the posterior distribution that \(\theta > 0.5\). For some reason, several of you reported the mode, which has almost nothing to do with this probability.
# part d:
# Use Beta(10, 10) prior
# Beta(50, 50) or Beta(100, 100) might be even better.
# Then the posterior is Beta(10 + 58 + 56, 10 + 42 + 44)
# Post. Prob. theta > 50% is
1- pbeta(0.5, 10 + 58 + 56, 10 + 42 + 44)
## [1] 0.9708828